Implementation of a Fully - Balancedperiodic Tridiagonal Solver on Aparallel
نویسنده
چکیده
While parallel computers ooer signiicant computational performance, it is generally necessary to evaluate several programming strategies. Two programming strategies for a fairly common problem|a periodic tridiagonal solver|are developed and evaluated. Simple model calculations as well as timing results are presented to evaluate these strategies. The particular tridiagonal solver evaluated is used in many computational uid dynamic simulation codes. The feature that makes this algorithm unique is that these simulation codes usually require simultaneous solutions for multiple right-hand-sides (RHS) of the system of equations. Each RHS solutions is independent and thus can be computed in parallel. Thus, a Gaussian-elimination-type algorithm can be used in a parallel computation and more complicated approaches such as cyclic reduction are not required. The two strategies are a transpose strategy and a distributed solver strategy. For the transpose strategy, the data is moved so that a subset of all the RHS problems is solved on each of the several processors. This usually requires signiicant data movement between processor memories across a network. The second strategy attempts to have the algorithm follow the data across processor boundaries in a chained manner. This usually requires signiicantly less data movement. An approach to accomplish this second strategy in a near-perfect load-balanced manner is developed. In addition, an algorithm will be shown to directly transform a sequential Gaussian-elimination-type algorithm into the parallel, chained, load-balanced algorithm.
منابع مشابه
Application and Accuracy of the Parallel Diagonal Dominant Algorithm
The Parallel Diagonal Dominant (PDD) algorithm is an eecient tridiagonal solver. In this paper, a detailed study of the PDD algorithm is given. First the PDD algorithm is extended to solve periodic tridiagonal systems and its scalability is studied. Then the reduced PDD algorithm, which has a smaller operation count than that of the conventional sequential algorithm for many applications, is pr...
متن کاملA Parallel Fast Direct Solver for Block Tridiagonal Systemswith
A parallel fast direct solver based on the Divide & Conquer method for linear systems with separable block tridiagonal matrices is considered. Such systems appear, for example, when discretizing the Poisson equation in a rectangular domain using the ve{point nite diierence scheme or the piecewise linear nite elements on a triangulated rectangular mesh. The Divide & Conquer method has the arithm...
متن کاملImplementation of a fully balanced periodic tridiagonal solver on a parallel distributed memory architecture
While parallel computers o er signi cant computational performance, it is generally necessary to evaluate several programming strategies. Two programming strategies for a fairly common problem|a periodic tridiagonal solver|are developed and evaluated. Simple model calculations as well as timing results are presented to evaluate these strategies. The particular tridiagonal solver evaluated is us...
متن کاملA Parallel Solver for Incompressible Fluid Flows
The Navier-Stokes equations describe a large class of fluid flows but are difficult to solve analytically because of their nonlinearity. We present in this paper a parallel solver for the 3-D Navier-Stokes equations of incompressible unsteady flows with constant coefficients, discretized by the finite difference method. We apply the prediction-projection method which transforms the Navier-Stoke...
متن کاملEfficient heterogeneous execution on large multicore and accelerator platforms: Case study using a block tridiagonal solver
The algorithmic and implementation principles are explored in gainfully exploiting GPU accelerators in conjunction with multicore processors on high-end systems with large numbers of compute nodes, and evaluated in an implementation of a scalable block tridiagonal solver. The accelerator of each compute node is exploited in combination with multicore processors of that node in performing block-...
متن کامل